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Abstract. Nomadic applications create replicas of shared objects that 
evolve independently while they are disconnected. When reconnecting, 
the system has to reconcile the divergent replicas. In the log-based ap- 
proach to reconciliation, such as in the IceCube system, the input is a 
common initial state and logs of actions that were performed on each 
replica. The output is a consistent global schedule that maximises the 
number of accepted actions. The reconciler merges the logs according to 
the schedule, and replays the operations in the merged log against the 
initial state, yielding to a reconciled common final state. 
In this paper, we show the NP-completeness of the log-based reconcil- 
iation problem and present two programs for solving it. Firstly, a con- 
straint logic program (CLP) that uses integer constraints for expressing 
precedence constraints, boolean constraints for expressing dependencies 
between actions, and some heuristics for guiding the search. Secondly, a 
stochastic local search method with Tabu heuristic (LS), that computes 
solutions in an incremental fashion but does not prove optimality. One 
difficulty in the LS modeling lies in the handling of both boolean vari- 
ables and integer variables, and in the handling of the objective function 
which differs from a max-CSP problem. Preliminary evaluation results 
indicate better performance for the CLP program which, on somewhat 
realistic benchmarks, finds nearly optimal solutions up to a thousands of 
actions and proves optimality up to a hundreds of actions. 



1 Introduction 



Data replication is a standard technique in distributed systems to make data 
available in different sites. The different sites may be disconnected (mobile com- 
puting) or connected (groupware) in which case shared data are replicated for 
efficiency reasons in order to avoid access through the network. Obviously the 
replication of mutable shared data may cause conflicts, the replicas may diverge 
into inconsistent states that have to be reconciled. Nomadic applications create 
replicas of shared objects that evolve independently while they are disconnected, 
when reconnecting the system has to reconcile the divergent replicas. 



What constitutes a conflict depends on the semantics of the application and 
on the user's intent. For example, in the version management system CVS ||, 
write actions are said to conflict if and only if they occur in the same line of the 
same text file. Accordingly, many existing reconcilers are restricted to specific 
data types, such as source files, or file systems or calendars. In contrast, in 
the log-based approach to reconciliation , best examplified in the IceCube 

system ^ , the input is a common initial state and logs of actions that were per- 
formed on each replica. In this setting, an action is composed of a precondition, 
an operation and a postcondition [fl3f . The output is a consistent global schedule 
that maximises the number of accepted actions. The reconciler merges the logs 
according to the schedule, and replays the operations in the merged log against 
the initial state, yielding to a reconciled common final state 0. 

In this paper, we show the NP-hardness of this reconciliation problem, by 
encoding SAT as a reconciliation problem, and study two programs for solving 
it. In section || we present a constraint logic program (CLP) that uses boolean 
constraints for expressing dependencies between actions, constraints over inte- 
gers for expressing precedence constraints, and some heuristics for guiding the 
search during branch-and-bound optimization. We evaluate the performance of 
this program on a set of randomly generated benchmarks, intended to modelize 
realistic log-based reconciliation problems, with densities (defined as the ratio 
between the number of constraints and the number of variables) 1.5 for both 
precedence and dependency constraints between actions. In these densities the 
CLP program finds quasi-optimal solutions up to a thousands of actions, and 
proves optimality up to a hundreds of actions. 

In section || we present another program based on a stochastic local search 
method with Tabu heuristics. It computes solutions in an incremental fashion, 
can use the initial logs of the application as starting solution, but does not 
prove optimality. One difficulty in the LS modeling lies in the handling of both 
boolean variables and integer variables, and in the handling of the objective 
function which differs from a max-CSP problem. Preliminary evaluation results 
indicate better performance for the CLP program. 

In the last section we present our conclusion and on-going work. 



1.1 Related work 

Log-based reconciliation is a new topic for which few algorithms have been devel- 
oped. The only implementation we know of is the IceCube system reported in |J. 
It is worth noting that the objective function of maximizing the number of ac- 
cepted actions, is different from maximizing the number of satisfied constraints. 
For that reason, the modeling of log-based reconciliation as a max-CSP problem 
is inadequate. This is also the main reason why in our second program based 
on local search, the min-conflict heuristics Jll[] or the adaptive search method of 
[pi do not perform well in our modeling, and we use instead a randomized Tabu 
heuristics. 

In || we investigate the average-time complexity of the CLP program as a 
function of both densities for dependency and precedence constriants between 
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actions. We demonstrate the existence of a single computational complexity peak 
on randomly generated problems, around densities 7 for precedence constraints 
and for dependency constraints between actions. Around this peak we observe 
phase transitions in the two dimensions of the density, where the mean running 
time of the program shifts from polynomial in the order to exponential. These 
experimental investigation of the average-case complexity of the CLP program 
are of quite general interest for the design of log-based reconciliation algorithms. 
In particular they indicate where the hardest problems are, and they clearly 
show that it is crucial to use dependency constraints in an active way when 
computing a schedule as these constraints greatly reduce the complexity of the 
problem. 

2 The log-based reconciliation problem 

2.1 Statement of the optimization problem 

We have to reconcile a set of logs of actions that have been realized indepen- 
dently, by trying to accept the greatest number of actions as possible. 

Input: A finite set of L initial logs of actions {[T/, ...,T™*] | 1 < i < L}, some 
dependencies between actions T- T l k , meaning that if T- is accepted then T l k 
must be accepted, and some precedence constraints T? < T k , meaning that if 
the actions are accepted they must be executed in that order. The precedence 
constraints are supposed to be satisfied inside the initial logs. 

Output: A subset of accepted actions, of maximal cardinality, satisfying the 
dependency constraints, given with a global schedule T- < ... < T k satisfying 
the precedence constraints. 

Note that the output depends solely of the precedence constraints between 
actions given in the input. In particular it is independent of the precise structure 
of the initial logs. The initial consistent logs can thus be used as starting solutions 
in some algorithms but can be forgotten as well without affecting the output. 

2.2 Complexity 

Proposition 1. The decision problem, i.e. finding a schedule of a given length, 
is NP- complete, even without dependency constraints. 

Proof. The decision problem is obviously in NP. Indeed, for any guessed schedule, 
one can check in polynomial time whether the schedule is consistent. 

NP-completeness is shown by encoding SAT into a reconciliation problem 
with singleton initial logs and precedence constraints only. 

Let us assume a SAT problem over N boolean variables with C clauses. 
For each boolean variable p, we associate 2 * C actions Po, p\, Pq , pf , with 
precedence constraints p l < p? 1 and p{ < p l for all clause indices i, j in [1,C]. 
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The actions p l and p\ are thus mutually exclusive for all clause indices i, j. We 
represent the valuation false for p by accepting the actions p l for all 1 < i < C, 
and the valuation true by accepting p\ for all 1 < i < C. This defines a one-to- 
one mapping a between valuations over N boolean variables, and the accepted 
actions in schedules of length N *C satisfying the mutual exclusion constraints. 

For each clause, such as pVgV-r, we associate the precedence constraints p l Q < 
Qq < r i < Po where i is the index of the clause. Being cyclic, these precedence 
constraints forbid to take simultaneously the actions p l a , qfr, r\ and p l a , that is, 
they encode the equivalent formula — ■(— ip A ->q A r). Hence a valuation r\ satisfies 
a clause if and only if the actions in a{rf) satisfy all the precedence constraints 
associated to the clause. Note that, unlike the mutual exclusion constraints, the 
precedence constraints are posted between action variables with the same clause 
index only. 

Now we prove that a set of C clauses over N variables is satisfiable if and 
only if there exists a schedule accepting N *C actions and satisfying the mutual 
exclusion constraints and the precedence constraints associated to the clauses. 

The implication is clear: if 77 is a valuation which satisfies all the clauses, then 
a(r]) is a set of TV * C actions which satisfies the mutual exclusion constraints, 
and which can be ordered with increasing clause indices and according to the 
precedence constraints for variables with the same clause index. 

For the converse, let us suppose that we have a consistent schedule of TV * 
C actions. Because of the mutual exclusion constraints, the schedule defines a 
valuation of the SAT problem: indeed for each propositional variable p, either 
p l is accepted for all i, and p is false, either p\ is accepted for all i, and p is true. 
Furthermore the precedence constraints between actions of index i establish that 
that valuation satisfies the ith clause. Therefore the valuation associated to the 
schedule satisfies all the clauses. QED. 

3 A CLP(FD,B) approach 

3.1 Modeling with mixed boolean and integer constraints 

In this modeling of the problem, we forget the initial (consistent) logs of actions 
and consider that all actions are at the same level. We have n elementary actions 
to which we associate: 

— n boolean variables ai,...,a n which say whether the action is accepted or 
not 

— n integer variables p\,...,p n which give the position of the accepted actions 
in the global schedule 

We have some dependency constraints 

dj =>• a,j 

and some precedence constraints 

Hi A a j => (pi < pj) 
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or oquivalcntly, assuming false is and true is 1, 

a* * fflj * Pi < Pj 

We want to maximize a\ + ... + a n . 

The search for solutions goes through an enumeration of the boolean vari- 
ables ttj's, with the heuristics of instanciating first the variable dj which has the 
greatest number of constraints on it (i.e. first-fail principle w.r.t. the number 
of posted constraints) and trying first the value 1 (i.e. best-first search for the 
maximization problem). 

This leads to the following straightforward CLP(FD,B) program (given in 
GNU-Prolog syntax): 

solve (Transact ions .Dependencies .Precedences .Schedule) : - 
length(Transactions ,N) , 
length (La, N) , f d_domain_bool(La) , 
length (Lp, N) , f d_domain(Lp , 1 ,N) , 
dependencies (Dependencies, Transactions, La), 
precedences (Precedences , Transactions, La, Lp) , 
sum(La, S) , 
fd_maximize( 

fd_labeling(La, [variable_method(most_constrained) , 
reorder (true) , 
value_method(max)] ) , 

S), 

fd_labeling(Lp, [value_method(min)] ) , 
schedule(La, Lp, Transactions, Keysort) , 
sort (Keysort , Schedule) . 

dependencies ( [],_,_). 
dependencies ([(X#==>Y) I L] ,T,La) :- 

nth(I,T,X), nth(I,La,A), 

nth(J,T,Y), nth(J,La,B), 

A#==>B, 

dependencies(L,T,La) . 

precedences ( [] ,_,_,_) . 
precedences ( [(X#<Y) |L] ,T,La,Lp) :- 

nth(I,T,X), nth(I,La,A), nth(I,Lp,P), 

nth(J,T,Y), nth(J,La,B), nth(J,Lp,Q), 

A*B*P#<Q, 

precedences (L,T, La, Lp) . 
sum([] ,0) . 

sum([B|L] ,S) :- S#=B+R, sum(L,R) . 

schedule ([] ,[],[],[]). 
schedule ( [B I La] , [P I Lp] , [T I Tr] , S) : - 
((B=0) 

-> schedule (La, Lp,Tr,S) 
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S=[(P-T) |R] , schedule (La, Lp.Tr.R)) . 



Note that the labeling done in the optimization predicate proceeds through 
the boolean variables only. It is well known indeed that interval propagation 
algorithms provide indeed a complete procedure for checking the satisfiability of 
precedence constraints. It is thus not necessary to enumerate the possible values 
of the position variables in the schedule, as we know that the earliest dates 
are consistent. The labeling of the positions is done outside the optimization 
predicate, just to compute a ground schedule by taking the earliest dates for 
each action, without backtracking. 

3.2 Benchmarks 

At the present time, we do not have benchmarks of real-life log-based reconcili- 
ation problems. Nevertheless we expect that in real-life reconciliation problems, 
the optimal solutions accept more than 80% of the actions typically. These con- 
siderations, plus some preliminary inspections at some calendar applications or 
the jigsaw problem presented in |9j, lead us to create a benchmark of randomly 
generated problems with density 1.5 for both precedence constraints and de- 
pendency constraints. We added a second series of more difficult benchmarks 
generated with the same density 1.5 for precedence constraints but without de- 
pendency constraints. 

Table [l] depicts the experimental results on both series of benchmarks. The 
size given in the second column is the total number of actions. The numbers 
of dependency constraints and precedence constraints are given in the following 
columns. These constraints are generated for each pair of actions randomly, with 
probability 1.5/ size which gives 1.5 * size constraints of each type in average. 
The second series of benchmarks contains no dependency constraints. 

The running times of this section have been measured in GNU-Prolog on 
a 866MHz Pentium III PC under Linux. We indicate, in order, the number of 
accepted actions in the first schedule found, the running time for finding this 
solution, the number of accepted actions in the optimal schedule, the running 
time for finding the optimal schedule (from the start), and finally the total 
running time including the proof of optimality. 

The first solution found in these benchmarks is always very near the opti- 
mal solution. This indicates that the most-constrained variable choice heuristics 
combined with the best-first search heuristics perform very well in this mod- 
eling of the problem. Optimal solutions with their optimality proof are always 
computed in less than a second for problems with up to a hundreds of actions. 
On larger problems optimality proofs become difficult to obtain but the first 
solution found is always satisfying and fast to compute. The problems without 
dependency constraints are harder to solve. Optimality proofs are nevertheless 
always obtained on problems of size up to 50. 
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Table 1. Experimental results. 
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Fig. 1. Screen dump of the local search program. 



4 A stochastic local search approach 

Approximated solutions to hard optimization problems can be found with heuris- 
tics. Local search is one of the fundamental heuristics that has been shown 
particularly effective for many classes of applications, including some classi- 
cal benchmarks of constraint programming Jl0|,p|j7|. Local search methods it- 
erate the transformation of some initial solution sq, by choosing at each step 
the next solution, Sj+i, in some neighborhood N(si) of Sj. Descent methods 
choose at each iteration the neighbour which minimizes the objective function 
/: s,_|_i = argmin a £N( Si )f(s) and stops whenever /(sj+i) > f(si). Descent meth- 
ods thus stop in the first encountered local optimum. Multi-start methods iterate 
the application of the descent method from different initial solutions, and stop 
with the best encountered local optimum. Simulated annealing, variable neigh- 
borhood methods and Tabu search are local search methods that escape from 
local minima in an iterative fashion, without restarting descents. 
Tabu search || consists in choosing at each step i the state 

Si+i = argmin seN{si) f(s) 
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even if /(sj+i) is greater than the best solution found. A Tabu list L records 
the already visited states. The Tabu states cannot be revisited, except if they 
improve the objective function. Furthermore the Tabu list is a short term mem- 
ory, after some number of iterations, already visited states are deleted from the 
Tabu list and can be freely reconsidered. In order to prevent cycles in the same 
set of solutions, the size of the Tabu list, as well as the neighborhoods, can be 
changed dynamically during the iterations. 

The log-based reconciliation problem can be modeled quite naturally as a 
local search problem for finding schedules that maximize the number of accepted 
actions. But there is one difficulty to mix boolean variables and position variable, 
and to define an appropriate evaluation function for guiding the search. 

In this section we shall treat reconciliation problems with precedence con- 
straints only. One solution is to represent in the states the positions of the actions 
in the current schedule, and to count in the objective function the number of 
actions which have all their precedence constraints satisfied. For guiding the 
search however, a more refined evaluation function is needed in order to have a 
measure of progress towards better solutions that do not yet improve the objec- 
tive function. We thus use an evaluation function that counts for each violated 
precedence constraints 

Pi £ Pj 

the error 

l + (Pi-Pj) 

The local moves are simply the incrementation or the decrementation by 1 of 
the position of an action in the schedule. 

The min-conflict heuristic JTT| ] consists in choosing for a move the variable 
with a highest error and the move which minimizes that error. Nevertheless in the 
reconciliation problem, improving the evaluation function on the variables with 
the highest error does not necessarily leads to a good solution w.r.t. the objective 
function, as actions with high errors can be simply not accepted. Therefore we 
don't use the min-conflict heuristic and perform local moves on all actions as 
long as they improve the evaluation function. The value of a configuration w.r.t. 
the objective function is the number of actions with errors. Note that this 
value is in fact a lower bound of the cost of the configuration w.r.t. the objective 
function, as constraints with unaccepted actions should be ignored. The cost of 
a configuration w.r.t. the objective function is thus computed as the number 
of actions remaining after successively removing the actions with the greatest 
number of violated constraints with non removed actions. 

This defines the descent method evaluated in the next section. The Tabu 
search method adds an adaptive memory to escape from local minima. When a 
local minimum is reached on a variable it is marked in the Tabu list for a while, 
and is not reconsidered except if it is the only way to improve the solution. 
In order to increase the diversification, the number of iterations for which the 
variables is marked Tabu is randomly generated between 1 and some maximum 
Tabu length value (10 in the experiments). Furthermore if a local minimum is 
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reached on all variables, one variable is chosen randomly to make a move that 
degrades the evaluation function. 

The user interface of the program visualizes the movements of the actions in 
the schedule during the search, see figure |l|. The graphical interface represents 
each action on a different line. The position of the action in the line indicates 
the scheduling of the action. Precedence constraints are materialized by lines 
between actions, those lines in green represent satisfied precedence constraints, 
those in yellow are violated. 

4.1 Evaluation 
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Table 2. Experimental results (benchmarks without dependency constraints). 



The local search algorithm described in the previous section has been imple- 
mented in Java. Table ^ depicts the performance results on the previous series 
of benchmarks without dependency constraints. The running times have been 
measured with the Java 2 vl.3 JDK compiler. For each bench we recall the num- 
ber of accepted actions in the optimal solution found by the CLP program, and 
present the best solution found by the (deterministic) descent method, and the 
best solution found by a run of the (randomized) Tabu search method. 

These results indicate that the descent method leads to local minima of poor 
quality, and that the Tabu search program succeeds in escaping from these local 
minima. However on large instances, the convergence to better solutions is very 
long. 
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Better tuning of the Tabu search method might still improve the performance 
of the method in these benchmarks, in particular by improving the diversifica- 
tion strategies. Experimental results obtained with the min-conflict heuristic 
gave worse results for the reasons explained in the previous section. A different 
modeling seems necessary to use these heuristics, improve the handling of the 
objective function and treat dependency constraints. 

5 Conclusion 

Reconciliation problems in nomadic applications present interesting combina- 
torial optimization problems, with both static and on-line versions. We have 
studied an NP-hard log-based reconciliation problem. Its modeling with boolean 
and precedence constraints lead to a straightforward CLP(FD,B) program that 
finds nearly optimal solutions up to a thousands of actions, and proves optimal- 
ity up to a hundreds of actions on realistic benchmarks. Enumeration proceeds 
through the boolean variables only. The precedence constraints are propagated 
in a complete way which makes enumeration superfluous. This program could 
still benefit however from more efficient algorithms for detecting cycles in prece- 
dence constraints, with a complexity independent of the size of the variables' 
domain ||. 

We have developed a second program based on local search with Tabu heuris- 
tic. One potential advantage for the local search approach is that it can benefit 
from the initial logs as starting solution, and can provide solutions incrementally 
for the on-line reconciliation problem. Currently this program performs however 
poorly in comparison to the CLP program. One difficulty lies in the handling 
of both boolean variables and integer variables which represent the positions of 
the actions in the schedule. In our modeling of the problem, the adaptive search 
method of M did not perform well because of the difficulty to handle the objec- 
tive function which differs from a max-CSP problem. Other modelings are thus 
currently under investigation. 

Acknowledgment. I would like to thank Silvano Dal Zilio, Peter Dreuschel, 
Cedric Fournet, Anne-Marie Kermarrec, Marc Shapiro and Antony Rowstron for 
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